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Abstract — The Bayesian formulation of sequentially testing 
M > 3 hypotheses is studied in the context of a decentralized 
sensor network system. In such a system, local sensors observe 
raw observations and send quantized sensor messages to a 
fusion center which makes a final decision when stopping taking 
observations. Asymptotically optimal decentralized sequential 
tests are developed from a class of "two-stage" tests that allows 
the sensor network system to make a preliminary decision in 
the first stage and then optimize each local sensor quantizer 
accordingly in the second stage. It is shown that the optimal local 
quantizer at each local sensor in the second stage can be defined 
as a maximin quantizer which turns out to be a randomization 
of at most M — 1 unambiguous likelihood quantizers (ULQ). 
We first present in detail our results for the system with a 
single sensor and binary sensor messages, and then extend to 
more general cases involving any finite alphabet sensor messages, 
multiple sensors, or composite hypotheses. 

Index Terms — Asymptotic optimality, maximin quantizer, mul- 
tihypotheses testing, sequential detection, two-stage tests, unam- 
biguous likelihood quantizer(ULQ). 



I. Introduction 

Sequential detection or sequential hypothesis testing has 
many important real-world applications such as target detec- 
tion in multiple-resolution radar (Marcus and Swerling fT5|), 
serial acquisition of direct-sequence spread spectrum signals 
(Simon et al. (19) ) and statistical pattern recognition (Fu 
||7)). The centralized version, in which all observations are 
available at a single central location, has been well studied. 
For example, when testing M = 2 hypotheses, a well-known 
optimal centralized test is the sequential probability ratio test 
(SPRT) developed by Wald (29), also see Wald and Wolfowitz 
[30 1. When testing M > 3 hypotheses, i.e., in the sequential 
multihypothesis testing problem, there is no tractable closed- 
form expression for the optimal centralized sequential tests, 
although various asymptotically optimal sequential tests have 
been proposed and investigated in the literature, see, for exam- 
ple, Kiefer and Sacks (TO), Lorden [ (14) , Draglin, Tartakovsky 
and Veeravalli (5), (6). 

In recent years, the decentralized version of sequential 
hypothesis testing problems has gained a great amount of at- 
tention and has been applied into a wide range of applications 
such as military surveillance (Tenney and Sandell pi)), target 
tracking and classification (Li et al. [13]), and data filtering 



(Ye et al. [31 1). Under a widely used decentralized setting, raw 
data are observed at a set of geographically deployed sensors, 
whereas the final decision is made at a central location, often 
called the fusion center. The key feature here is that raw 
observations at the local sensors are generally not directly 
accessible by the fusion center, and the local sensors need 
to send quantized summary messages (generally belonging 
to a finite alphabet set) to the fusion center. This is due to 
limited communication bandwidth and requirements of high 
communication robustness. 

Unfortunately, decentralized sequential hypothesis testing 
problems are very challenging, and to the best of our knowl- 
edge, existing research is restricted to testing two simple 
hypotheses, for example, see Veeravalli [26], Veeravalli, Basar 
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and Poor (27], Nguyen, Wainwright and Jordan [18], and 
Mei p7[ . It has been an open problem to find any sort of 
asymptotically optimal solutions for the decentralized sequen- 
tial testing problem when testing M > 3 hypotheses. This 
is not surprising, because even in the centralized version, it 
requires sophisticated mathematical and statistical techniques 
and only asymptotic optimality results are available. 

The primary goal of this paper is to develop a class of 
asymptotically optimal decentralized sequential procedures for 
testing M > 3 hypotheses. To do so, a major challenge we 
need to overcome is finding the "optimal quantizers" that can 
best send quantized summary sensor messages from the local 
sensors to the fusion center so as to lose as little information 
as possible. Intuitively, such a quantizer should depend on the 
true distribution of the raw data, which is unknown, and thus 
stationary quantizers are generally not optimal. In addition, 
since a quantizer can be any measurable function as long as 
its range is in the given finite alphabet set, it resides in an 
infinite dimensional functional space. Hence it is essential to 
investigate the form of the "optimal quantizers" so that one 
can reduce the infinite dimensional functional space to a finite- 
dimensional parameter space for the purpose of theoretical 
analysis and numerical computation. Note that when testing 
M = 2 hypotheses, Tsitsiklis [23] and Veeravalli et al. 
(27) showed that the optimal quantizers can be found from 
the family of monotone likelihood ratio quantizers (MLRQ), 
whose form is defined up to a finite number of parameters. 
Unfortunately, such a result does not apply to the case of 
testing M > 3 hypotheses. To find the form of the optimal 
quantizers for multi-hypotheses, we propose to combine three 
existing methodologies together: two-stage tests in Stein (20| 
and Kiefer and Sacks (TO) (or equivalently, tandem quantizers 
in Mei |17|), unambiguous likelihood quantizers (ULQ) in 
Tsitsiklis |23|, and randomized quantizers (see Chernoff (3J 
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where the feedback V^-i is defined by 
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Fig. 1: A widely used configuration of sensor network 



for a closely related topic on randomized experiments). 

The remainder of the paper is organized as follows. Section 
[IT] gives a rigorous formulation of decentralized sequential 
multihypothesis testing problems under a Bayesian framework. 



Section III provides a general definition of two-stage tests and 
discusses their implementation issues, especially those of the 
randomized quantizers. To highlight our main ideas, Section 
IV states our main results for a simplified sensor network 
system with a single sensor and binary sensor messages: Sub- 
section |IV-A| develops asymptotically optimal decentralized 
sequential tests by considering two-stage tests when the local 
quantizers are the proposed "maximin quantizers," and Sub- 
section IV-B characterizes maximin quantizers and discusses 



their numerical computation issues. Section [V] extends our 
main results to three more general cases: (A) systems with 
finite alphabet sensor messages; (B) systems with condition- 
ally independent multiple sensors; and (C) testing composite 
hypotheses. Numerical simulation results are presented in 



Section VI and concluding remarks are included in Section 
|VII| The technical details are provided in the appendices. 

II. Notation and Problem Formulation 

As illustrated in Fig[T[ in a widely used configuration, a 
sensor network consists of K local sensors labeled by S 1 , . . ., 
S K and a fusion center which makes a final decision when 
stopping taking observations. At each time step n = 1,2, ... , 
each local sensor S observes raw data {X k } and sends 
quantized summary messages {U k } to the fusion center. Here 
the quantized messages {{/„} are required to belong to a finite 
alphabet, say, {0, 1, ... , l k — 1}, due to limited communication 
bandwidth or requirements of high communication robustness. 
In other words, the fusion center does not have direct access to 
the raw data {X k }, and have to utilize the quantized sensor 
messages {£/„} to make a final decision. If necessary, the 
fusion center can send feedback {V k } to the local sensors so 
as to improve the system efficiency. 

To be more rigorous, we need to further specify the form 
of the sensor message functions. In this paper, we focus 
on systems with full feedback, but local memories restricted 
to past decisions, e.g., Case E of Veeravalli et al. (27) . 
Mathematically, at time n, for each k = 1,2,...,K, the 
quantized sensor message at the kth local sensor is assumed 
to be of the form 



yk 



^n(^[l,n-l]) • • • J ^[l,n-l]) 



(2) 



and {7m n _ 1 i = (U k , . . . , denotes all past local sensor 

messages. That is, the quantizer §\ is a function used by 
sensor S to map the local raw data X k into {0, 1, . . . , l k — 1}, 
and the choice of cf> k can depend on the feedback V k _ 1 and 
can be a randomized function (to be discussed later). 

In decentralized sequential multihypothesis testing prob- 
lems, there are M hypotheses regarding the distribution P of 
the raw data {X k }: 



H,; 



m = 0,1,. . . ,M- 1. 



(3) 



Under each P m , the raw data X k at local sensor S k are i.i.d. 
with density /„(•) with respect to a common underlying mea- 
sure, and the raw data {X k } are assumed to be independent 
among different sensors. Hence the distributions of the raw 
data under P. m are completely determined by the K densities: 
■ ■ , fm ■ Below we simply state that the true state of nature 
is m or P m if the hypothesis H m is true. 

A decentralized sequential test 6 consists of a rule to 
determine the sensor messages, a stopping time N used by the 
fusion center and a final decision rule 1)6 {0, 1, . . . , M — 1} 
that chooses one of the M probability measures P m 's based on 
the information up to time N at the fusion center. As in Wald 
(29), Veeravalli et al. (27), and Veeravalli (26), let c > be 
the cost per time step until stopping, and let W(m, m') be the 
loss of making decision D = m! when the true state is P m . It 
is standard to assume that W(m,m) = but W(m,m') > 
for any m ^ m', i.e., no loss occurs if and only if a correct 
decision is made. Then when the true state of nature is P m , 
the total expected cost of a decentralized test S is 

n c (5;m) = cE m (N) + ^ W(m, m')P m {D = m'} 
ni' 

where E m is the expectation operator under P m . In a Bayesian 
formulation, we assign prior probabilities ir = (tto, . . . , ttm-i) 
to the M hypotheses Ho, • • • ,Hjw_i. Hence, the Bayes risk 
of the decentralized test 8 is 



K c (5) = ^2ir m 1l c (5;m). 



(4) 



U k = ^ k n {X k -V k -i) G{0,l,...,Z fc -l} 



(1) 



The Bayes formulation of the decentralized sequential multi- 
hypothesis testing problem can then be stated as follows. 

Problem (PI): Minimize the 1Z C (5) in Q among all possible 
decentralized sequential multihypothesis test procedures 5. 

Denote by Sg(c) a Bayes solution to (PI). In Veeravalli et 
al. (27), 5g(c) is obtained through dynamic programming for 
the simplest case of testing binary hypotheses, i.e., M = 2. 
Unfortunately, in a general multihypothesis setting, when 
M > 3, it is impossible to find 8* B (c), since the problem is 
intractable even for the centralized version, see, for example, 
Dragalin, Tartakovsky and Veeravalli (5). This prompts us 
to adopt the following asymptotic optimization approach in 
which the cost c per time step goes to 0. 

Problem (P2): Find a family of decentralized sequential mul- 
tihypothesis testing procedures {Sa(c)} that is asymptotically 
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optimal in the sense that 



will be less efficient. This issue will be discussed further 



lim K c (5* B {c))/K c (8 A (c)) 

C— ¥0 



1. 



(5) 



Now let us discuss the concepts of quantizers and their 
Kullback-Leibler (K-L) divergences, both of which will be 
essential in our asymptotic optimality theory. A quantizer is 
either a deterministic measurable function or a randomization 
of some (possibly infinitely many) deterministic measurable 
functions that maps the raw data into a finite alphabet set, e.g., 
the function <\>\ in (fil is a quantizer. The quantizer is called 
a deterministic quantizer if the corresponding measurable 
function is deterministic. At a given local sensor S (here and 
below we miss the superscript k for simplicity), denote by 
$ the set of all possible local deterministic quantizers </>'s 
and let f m (-',(j>) be the induced probability mass function of 
quantized message U n — (f)(X n ) when the raw observation 
X n is distributed according to f m ( ) under P m , i.e., 

f m (u; 0) = P m (<f>(X n ) =u), for u = 0, 1, . . . ,1 - 1. (6) 

For the deterministic quantizer <f>, it is easy to see that its K-L 
divergences are defined by 



J(m, m'; 0) = f m (u; 0) log f f^- (7) 

for all to 7^ to'. However, we need to be very careful 
when defining the K-L divergences of a randomized quantizer 
4> = YjP'ft m at assigns probability masses {p^} onto some 
countable subset of deterministic quantizers {ft} C $. On the 
one hand, one can directly substituting the <f> in ^ by 4>, i.e., 



Z(m, to'; 



V- /m ( U ;0)log^4 (8) 



u=0 



fm'(u; <j)) 



where 



f m (u; 4>) = P m (4>(X) = u), u = 0, 1, . . . , I - 1. 

This type of the K-L divergence has been defined for random- 
ized quantizers in the engineering literature, e.g., Tsitsiklis 
[23 1 . On the other hand, one can also define the K-L diver- 
gence of the randomized quantizer <f> by the weighted average 
of those of the deterministic quantizers it randomizes: 

/(to, to'; (f>) = ^^pPI(m, to'; (fyi), < to 7^ m' < M — 1. 

(?) 

By Jensen's inequality, we have I(m,m' ; cf>) < I(m,m';4>), 
i.e., the K-L divergence defined in ([8]) is dominated by that in 
(|9j, also see Appendix [A] for more discussions. 

To the best of our knowledge, the K-L divergence in ([9]) 
has not been studied in the literature so far, and it turns out 
that it will play a central role in our asymptotic theory. The 
reason why our asymptotic theory involves the K-L divergence 
in (|9]l instead of that in ([8]l is due to our novel way of 
implementing randomized quantizers to minimize loss of in- 
formation. Roughly speaking, when implementing randomized 
quantizers, it is essential for the fusion center to know which 
specific deterministic quantizer is going to be used at the local 
sensor at each time step, since otherwise the fusion center can 
be confused by randomized quantizers and its decision making 



in Subsection III-B Also note that a deterministic quantizer 
can also be thought as a randomized quantizer that assigns 
probability one to itself. Denote by $ the set of all possible 
quantizers at the local sensor S, deterministic or randomized. 

Throughout our paper we make the following standard 
assumption to ensure the finiteness of the expectation of the 
raw data's log-likelihood ratios. 

Assumption 1. For any two different states < m ^ m! < 
M — 1 and local sensor S , 



< E,, 



log 



< OO. 



In the literature, researchers often assume a uniform 
bound on the second moments of the log-likelihood ratio 



log 



/£(«**;■ 



under P m . See, for example, Kiefer and Sacks 



JTO] and Mei (17). Here our assumption is much weaker, and it 
turns out that it will be sufficient for the first-order asymptotic 
optimality under our setting. 

III. Two-Stage Test Procedures 

In this section, we introduce a class of "two-stage" decen- 
tralized sequential tests in which each local sensor uses two 
stationary (possibly randomized) local quantizers with at most 
one switch between these two quantizers. This type of tests 
are useful because they allows the fusion center to first make 
a preliminary guess about the true state of nature and then 
optimize the procedure accordingly. 

To highlight our main ideas, in the present and next sections 
we assume that the sensor network system consists of a single 
local sensor, i.e., K = 1 and all quantized messages are binary, 
i.e., U n 6 {0, 1}. Extensions to general cases are presented in 
Section [V] To save notations, we drop all the superscripts de- 
noting the sensors. That is, in this and next sections we assume 
that one observes raw data Xi, X 2 , ■ ■ ■ , which are i.i.d. with 
density f m (x) under the hypothesis H m . The final decision is 
based on quantized messages U n — <p n (X n ;V n -x) € {0,1} 
with the feedback V n ~i = tp n -i(Ui, • • • , f/ n _i). For a given 
(randomized) quantizer <j>, the K-L divergence of P m / from 
P m is /(to, to' ; <j>) defined in (j9j). 

A. Our Proposed Test 

Our proposed two-stage test 6(c) can be defined as follows. 
In the first stage of 6(c), the local sensor can use any 
"reasonable" stationary deterministic quantizer and the fusion 
center needs to make a preliminary guess about the true state 
of nature. The only requirement is that as the cost c — > 0, the 
probabilities of making incorrect preliminary guess go to zero 
but the time steps taken at this first stage become negligible 
as compared to those of the overall procedure (or the second 
stage). 

To be more concrete, let u(c) 6 (0,1/2) be a function 
of c such that u(c) — > and log u(c) / log c — > when 
c — > 0, e.g., u(c) — 1/1 log c|. Choose a deterministic 
quantizer <f)° such that I(m,m' ;<f)°) > for any two states 
< to 7^ to' < M — 1, and let the local sensor use 
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the stationary quantizer <fr° to send i.i.d. sensor messages 
U„ = <f> (X n ) to the fusion center. Then the fusion center 
faces a classical sequential detection problem with the i.i.d. 
sensor messages f/ n 's as inputs, and thus it is intuitively 
appealing to make a preliminary decision based on posterior 
distributions. Specifically, at each time step n = 0, 1, • • • , 
the fusion center updates recursively the posterior distribution 
(TTo.m Ti,n) . . . , 7r A /_i,„) as follows: 

2-uO<m'<M— 1 n m',n— ljm'\yn'i <P ) 

Then the fusion center will stop the first stage at time step 
Mi = minjn > : max Wm „ j > 1 — u(c)\ 

0<m<M-l 

and when stopped, the fusion center makes a preliminary 
decision 

D = argmax ir m , No . 

0<m<M-l 

Note that the preliminary decision Do is well-defined because 
the maximum value of TT m ,N is attained at only one index 
m due to the definition of No and the fact that u(c) < 1/2. 
For the purpose of practical implementation, the preliminary 
decision Dq can be transmitted to the local sensor through a 
feedback of log 2 M bits. 

In the second stage of our proposed test 6(c), the local 
sensor will switch to another stationary (likely randomized) 
quantizer that may depend on the preliminary decision Dq. 
Without loss of generality, we assume that the local sensor uses 
the stationary quantizer <f> m when the preliminary decision at 
the first stage is Dq = m for m = 0, 1, . . . , M — 1. Here we 
put a bar over cf> m to emphasize that it is likely a randomized 
quantizer when optimized, and we will postpone the detailed 
discussion about how to implement randomized quantizers to 
the next subsection. 

Now at the second stage, the fusion center shall ignore the 
preliminary decision D and continue to update the poste- 
rior distribution (ttq^, ■ ■ • , ^M-i,n) with the sensor messages 
generated from the new quantizer <fi m when Dq = m (how to 
update will be discussed in the next subsection). Then the 
fusion center will stop the second stage (hence the whole 
procedure) at time step 

N = minjn > ./Vn : max {7r m „ j > 1 — cj (11) 

0<m<Af-l 

and when stopped, the fusion center makes a final decision 

D = argmax Tt m ,N- 
0<m<M-l 

From the asymptotic point of view, many other possible 
decision rules can also be used at the fusion center. For 



instance, let r Tl 



E 



ftm',nW(m' ,m) be the average 



posterior cost when making a decision m at time n, and then 
the fusion center can stop the second stage at time 



N = min{iV m : < m < M - 1} 



(12) 



where 



min{n > Nq 



<c}, 



0,1, 



,M - 1. 



Based on our experiences, the stopping time N defined in ( p"2] ) 
is slightly better than that in (Hi in finite-sample numerical 



simulations, especially when the costs W(m',m) are not a 
simple — 1 cost function. Moreover, at the second stage, our 
proposed test will continue to update the posterior distribution 
instead of starting afresh as required by the two-stage tests in 
Section V of Kiefer and Sacks JTO] or in Section IV of Mei 
JT7|. The main reason is to further utilize information gathered 
from the first stage so as to improve the efficiency in finite- 
sample simulations, although it also means extra treatments in 
asymptotic arguments. 

B. Implementing Randomized Quantizers and Updating Pos- 
terior Distribution 

When testing M > 3 hypotheses, randomized quantizers 
are likely needed in the second stage in order to develop the 
optimal two-stage tests, and thus it is necessary to determine 
the appropriate approach to implement them as well as how to 
update posterior distributions at the fusion center, especially 
at the second stage. Assume a randomized quantizer is given 
by 4> — YlP 3 ^- The key requirements for randomization 
in our two-stage test is that the fusion center must know 
which deterministic quantizer is picked to quantize the raw 
observation, since otherwise the randomization can cause con- 
fusion at the fusion center. The most straightforward (though 
practically infeasible) implementation is to let the fusion center 
do the randomization directly. Specifically, at time step n the 
fusion center will choose the deterministic quantizer (jp with 
probability p 7 , say choosing the deterministic quantizer <j^^ n \ 
Through a feedback from the fusion center, the local sensor 
will then use the chosen deterministic quantizer at time 
step n to quantize the raw observation. After receiving the 
quantized sensor message U n at time step n, the fusion center 
then update the posterior distribution as follows: 

7r " 1 '" E53 *w lB -i/m' (B»;^' (n) ) 

because the fusion center knows that U„ comes from the 



(14) 



deterministic quantizer 



at time step n. 



A theoretically equivalent but more feasible implementation 
in practice is to adopt a "pseudo-randomization" at the local 
level through the so-called "block design" (see Section V of 
Kiefer and Sacks [10|). To be specific, suppose <f> randomizes 
a finite number (say i) of deterministic quantizers, and all 
p 3 's are (or can be approximated by) rational numbers with 
b a common denominator. Then we divide the time steps 
into blocks of size b, and within each block, the raw data 
are quantized with deterministic quantizers {0 , . . . ,tp 1 } fol- 
lowing a fixed order such that each <\P is used for exactly 
p'b times. Under this implementation, the fusion center again 
knows which deterministic quantizer is used at each time step, 



and thus can update the posterior distribution as in ( 14 1. 

We would like to point out that our implementation of 
randomized quantizers is very different from those existing 
implementations in the literature (see Tsitsiklis (23)). In the 
latter the randomization is done at the local level in the 
sense that the local sensor randomly picks one of the deter- 



(13) ministic quantizer <p J, s, and the fusion center will only get 
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the quantized message U n without knowing exactly which 
deterministic quantizer is used to generate U n . In this case, 
to update posterior distribution, the fusion center has to plug 
in 4> (instead of c^™)) into 



14 1, i.e., 



_ ir m ,n-lfm(Un; 4>) 

^m,n — t/ | - • 

z2m'=0 ^m',n— ifm' \U n ] (ft) 

Since our proposed implementation and the local randomiza- 
tion implementation lead different likelihood ratios, it is not 
surprising that there are two different kinds of K-L divergences 
for a randomized quantizer in Section [H] one defined in ([8]) 
and the other in Q. 

IV. Main Results 

In the present section, we show that a two-stage test can be 
an asymptotic optimal solution to problem (P2) by carefully 
choosing the quantizers used in the second stage. We also 
give characterizations of these optimal quantizers as well as 
the corresponding numerical computation. 

A. Maximin Quantizers and Asymptotic Theory 

Let us begin with the definition of some useful information 
numbers. For a given (deterministic or randomized) quantizer 
6 £ $, define 



/(to; </)) 



min /(to, m ; < 



(15) 



for each state m = 0, 1, . . . , M — 1. That is, /(to; <fr) charac- 
terizes the least divergence from the state to to other states. 

The following theorem, whose proof is presented in Ap- 
pendix [B] establishes the asymptotic properties of a two-stage 
test 5(c) as the cost c goes to 0. 

Theorem 4.1. Let 5(c) be a two-stage test with 
{(j>0, • • • j <pM-l} being the set of (possibly randomized) 
quantizers used in its second stage. Assume each (f>. m 
randomizes a finite number of deterministic quantizers, 
and suppose that the prior probabilities 7r m > and 
/(to; 4> m ') > for all states to = 0, 1, . . . , M — 1 and 
m! = 0, 1, . . . , M — 1. Then as c — > 0, the time steps N taken 
by the two-stage test 5(c) satisfies 

E m {N} = (l+o(l))| logc|//(m; m ), m = 0, 1, . . . , Af-1, 

(16) 

and the final decision D of the two-stage test 5(c) satisfies 

P m {D^m} = 0(c), to = 0,1,...,M-1. (17) 
Thus, the Bayes risk of the two-stage test 5(c) is 



Af-i 



K c (5) = c\ logc|(l + o(l)) > _ 7r m //(m; <j> m ) 



rn=Q 



(18) 



In light of Theorem 4.1 to asymptotically minimize the 
Bayes risk within the class of two-stage tests, it is clear that 
one should maximize the information numbers I(m;cf> m ) for 
m = 0, 1, . . . , M — 1. This leads to a natural definition of the 
optimal quantizers that we should use in the second stage: 



Definition 4.1. For m = 0, 1, . . . , M — 1, define the maximin 
quantizer with respect to P m as 

Cf c = arg sup /(m; <j>) 

and define the corresponding maximin information number by 
/(to) = sup I (m;4>). 

As shown later in Theorems |4.3| and |5.1| the supremum of 
/(to, (f>) is attainable, and the maximin quantizers not only 
exists, but also can be realized as randomization of a finite 
number of deterministic quantizers. Now we are ready to 
investigate the asymptotic optimality properties of the two- 
stage test when the maximin quantizers are used in the 
second stage. Denote by 5a(c) such a two-stage test. Then 



by Theorems 4.1 we have 



M-l 

Tl c (5 A (c)) = (1 + o(l))c\ logc| *rn/I(m). (19) 

m— 

as c — > 0. What is surprising is that 5a(c) is not only 
the best one within the class of two-stage tests, but also 
asymptotically optimal among all possible decentralized tests. 
A key step in the proof is the following important theorem 
which establishes asymptotic lower bounds on the expected 
time steps of any decentralized tests with "suitably small" 
probabilities of making incorrect decisions. 

Theorem 4.2. Assume that 5(c) is a decentralized (not nec- 
essarily a two-stage) test that makes a final decision D and 

P m {D ^m} = O(clogc), to = 0, 1, ... , M — 1, 

as c —¥ 0. Then the time step N taken by 5(c) satisfies 

E m {N} > (|logc|-log|logc| + 0(l))//(m) 



(l + o(l))|logc|//(TO) 



(20) 



for all to = 0,1, ... , M — 1. 

The proof of Theorem 4.2 is presented in Appendix [C] 
The first-order asymptotic lower bound will be sufficient to 
prove the first-order asymptotic optimality of 5a(c), and the 
reason why we present a higher order lower bounds is due 
to its potential usefulness in higher-order analysis in further 



research. By relation ( 19 1 and Theorem 4.2 we have 



Corollary 4.1. The procedure 5a(c) is asymptotically Bayes 
up to first-order. 

Proof: Let 5g(c) be the Bayes procedure. By definition, 
K c (5* B (c)) < K c (S A (c)). Using the relation (fl9b and the 



definition of Bayes risk lZ c (5g(c)), the probabilities for the 
Bayes procedure 5* B (c) to make incorrect decisions are at most 
O(clogc). By Theorem|4.2| the stopping time r* of the Bayes 



procedure 5* B (c) satisfies (20 1. Now using the definition of 



Bayes risk again, for any test, the cost of time steps taken to 
make the final decision is only portion of the Bayes risk. In 
particular, 

n c (5* B (c)) >c£V m E m {r*}> (l+o(l))c| log c| Tm//(m). 

tci m 

Combining all arguments yields that 1Z c (5* b (c)) /1Z c (5a(c)) —> 
1 as c — > 0, completing the proof of the corollary. ■ 
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It is useful to point out that the test 6a(c) is asymptotic 
Bayes mainly because the local sensor uses the maximin 
quantizers 0™ ax 's in the second stage. Since the maximin 
quantizers do not depend on the prior distribution {7r m }'s, it is 
easy to see from ( [To*} and (17i that the asymptotic optimality 
properties of 8a (c) are actually robust with respect to {w m } 
as long as all prior probabilities are positive. Likewise, the 
asymptotic Bayes properties still hold if the stopping times 
of 8a (c) at the fusion center are replaced by other efficient 
multi-hypotheses tests, e.g., those in Draglin, Tartakovsky and 
Veeravalli (5j, (6). 

B. Characterizing the Maximin Quantizers. 

In this subsection, we provide a deeper understanding of 
the maximin quantizers {</>™ ax : m = 0, 1, . . . , M — 1} and 
also illustrate how to compute them explicitly when the sensor 
messages are binary. 

Let us first introduce the unambiguous likelihood quantizer 
(ULQ) which was first proposed in Tsitsiklis J23] as a gener- 
alization of Monotone Likelihood Ratio Quantizer (MLRQ). 
For notational convenience, here we give the definition of ULQ 
only for the case of binary sensor messages, and the general 



definition will be provided in Definition 5. 1 in Subsection V-A 



Definition 4.2. A deterministic quantizer <f> 6 <£> is said to 
be an unambiguous likelihood quantizer if there exist real 
numbers {a rn : m = 0, . . . , M — 1} such that 



M-l 



<P(X) = I(J2 a m f m (X)>0) 



(21) 



and for any < m! < M — 1, the set {a m } satisfies the 
following condition 



Pm> I a mfrn(X) = j = 0. 

I m=0 J 



(22) 



When relation (22 1 holds for any set of {a m } that are 



not simultaneous zero, the set of pdf's {/ m } are said to 
be linearly independent. With the definition of ULQs, the 
following theorem characterizes the form of the maximin 
quantizers </>™ ax . The proof is very technical and is deferred to 
Appendix [A] 

Theorem 4.3. For each to = 0, 1, . . . , M — 1, the maximin 



quantizer 



exists and can be chosen as a randomization 



of at most M — 1 deterministic quantizers. Moreover, if the 
pdf's {f m } are linearly independent, then it can actually be 
chosen as a randomization of at most M — 1 deterministic 
ULQ quantizers. 

Clearly, when testing M = 2 simple hypotheses, the 
ULQs become MLRQs, and thus the maximin quantizers in 
the second stage is just the deterministic MLRQ, which is 
consistent with those results in Mei (17). 

Note that Theorem 14.31 reduces the search of the maximin 
quantizers from an infinite dimensional function space to a 
parameter space of dimension 0(M 2 ). To see this, fix a 
state to and define M 2 — 1 parameters as probability masses 



{PL ■ 1 < 3 < M- 1,j4 > 0,E^V m = 1}, and 
ULQ coefficients {a 3 m , : 1 < j < M - 1, < rri < 

M — 1, Xw=o( a m m') 2 = •*-}■ B ase d on every combination 
of these parameters, define by cf> the quantizer randomizing 
M - 1 ULQs: $ - YLfJi V m 44: where 

M-l 

C(X)=J(£ < m ,jw(X)>0). 

m'=0 

The maximin quantizer 0™ x can then be found as <f> that 
maximizes 

mm I (m, I; (f>) (23) 
among all possible combinations of 

{Prni a L,m'}i<j<M-lfl<rn'<M-l- 

To further reduce computational complexity of the maximin 
quantizers, we can apply the following lemma which provides 
a sufficient condition that a deterministic MLRQ quantizer is 
the maximin quantizer. 

Lemma 4.1. Given ml ^ to, let 4>m,m' be the deterministic 
MLRQ quantizer that maximizes the K-L divergence of m! 
from to, i.e., 

ct>m,m' = argsup/(TO,TO';^). 

If there exists a state m! 7^ to such that for any other state 
to" ^ to: 

I(m,m";4> mtm i) > /(to, to'; <j> m ,m>) 
then (j) m , m i is also the maximin quantizer for state m. 

Proof: By definition, 
/(to; <f> m>m >) = min /(to, m"; <f) m ,m') = I(m, to'; <f> m , m ')- 

Take any (f> £ 

I(m;4>) < I(m,m';4>) < I(m,m';<f> m>m >) = I(m;<j> m , m ') 
and thus (j> m ,m' is the maximin quantizer for state to. ■ 

V. Extensions 



Section IV deals with the simplest case when the network 
only has a single sensor with binary sensor messages. In 
this section, we extend our results to three more general 
scenarios: 1) the sensor messages belong to a finite alphabet 
(not necessarily binary); 2) there is more than one sensor in 
the network (though observations are independent between 
different sensors); and 3) the hypotheses are composite. 

A. Sensor Messages Belonging to a Finite Alphabet 

Suppose the network still consists of only one sensor, but 
now the sensor messages belong to a finite alphabet, say, 
{0,1,...,/—!} with I > 2. In this scenario, the definitions 



of two-stage tests (Subsection III-A i and maximin quantizers 



(Subsection |IV-A i are still applicable, and Theorem 4. 1 and 



Theorem 4.2 also hold. The only change is Theorem 4.3 
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as we need to consider the following general definition of 
ULQ, which originally proposed in Tsitsiklis J23| and includes 



Definition 4.2 as a special case. 



Definition 5.1. When the sensor messages belong to a finite 
alphabet {0, 1, ...,/ — 1}, a deterministic quantizer (f> £ $ is 
said to be an unambiguous likelihood quantizer (ULQ) if and 
only if there exist real numbers {a^ m : < i < I — 1,0 < 
to < M — 1} such that 

M-l 

4>(X) = argmin Y] a iym f m (X) (24) 

0<i<i-l — „ 
— — m— 

one/ f/;e probability of a tie is zero under every P m for m = 
0, 1, . . . , M - 1. 



With this definition, Theorem 4.3 can be generalized as 
follows. 

Theorem 5.1. Suppose the sensor messages belong to a 
finite alphabet {0, 1, . . . , I — 1} with I > 2. Then for m = 
0,1,... ,M — 1, the maximin quantizer (j)"^* can be realized 
as a randomization of at most M — 1 deterministic quantizers. 
Moreover, for every m, there exists a sequence of quantizers 
{4*111,1} each randomizing at most M — 1 ULQs, such that 
I(m; 4>m,i) — ► I{m), that is, the maximin quantizer can 
be approximated by {</> m ,i}- 

The proof of Theorem |5.1| is presented in Appendix |A| N ote 
that there is a significant difference between Theorem |4.3| and 
Theorem 5.1 When the sensor messages are binary (i.e., I = 



2), we are sure that the maximin quantizers can be attained 
by randomizing M — 1 ULQs if the pdfs /o,..., /m-i are 
linearly independent. However, this may not be true for I > 
3. Fortunately, since the maximin quantizers can always be 
approximated as described in Theorem |5.1| the issue is not 
essential from the viewpoint of numerical computation, as we 
can compute the maximin quantizers (or their approximations) 



in the same way as in Subsection IV-B except that each ULQ 
is now associated with an I by M matrix A — {ai :m }. 

Another benefit of Theorem 15.11 is that it can deal with 
the case when the sensor messages are binary but the pdfs 
are not linearly independent. Such a case was not addressed 



by Theorem 4.3 and Theorem 5.1 shows that although the 
maximin quantizer 0™ x may no longer be a randomization 
of at most M — 1 ULQs, it can still be approximated by a 
sequence of qnatizers {4> m ,i}: each one randomizing at most 
M - 1 ULQs. 

B. Multiple Sensors 

We now assume that there are K > 2 sensors in the system 
in which all raw observations are independent from sensor 
to sensor conditioned on each P m , m = 0, 1, . . . , M — 1. 
In the following notation, we use the superscripts to denote 
different sensors as in Section [TTJ For simplicity, we assume 
the sensor messages are binary, since the extension to the 
scenario with a finite alphabet sensor messages can be easily 



done as in Subsection V-A The key to extend our results 
is to treat the quantizers in Sections III and IV as vectors 



is <fi = (cj) 1 , . . . , 4> K ), where each local sensor S k uses the 
deterministic quantizer <fr to quantize the raw data. Denote 
by $( A ) the set of all (deterministic) quantizer vectors, and 
define a randomized quantizer vector 



where J = (</> lj , . . . , 4> K - j ) € and {p j } are the prob- 

ability masses assigned to the set of deterministic quantizer 
vectors J } C & K K Let the set of all quantizer vectors be 
$ W (a deterministic quantizer can be viewed as a randomized 
one which assigns probability one to itself). The implemen- 
tation of a randomized quantizer vector (j> = ^2,jP<j) ' 3 i s tne 
same as that in Subsection |III-B| i.e., the fusion center knows 
about which deterministic quantizer vector is picked, either 
letting the fusion center conduct the randomization directly 
or using the pseudo-randomization block design at the local 
sensor level. Likewise, for a deterministic quantizer vector 
4> = ((f) 1 , . . . , cf> K ), the K-L divergence of state m! from state 
771 is defined as 



/(m, m'; i 



A" 

E 

fe=i 



7(777,777'; i> k ) 



(25) 



and for a randomized quantizer vector <fi — J2j P^4> ' J < tne K-L 
divergence is a weighted average as in Section III] 



/(TTJ, m'; tfi) = ^~^p J /(T77, 777,'; 



(26) 



Now the maximin quantizer vectors {</>™ ax } and maximin 
information numbers {/(m)} for quantizer vectors can be 
defined in exactly the same way as in Subsection |IV-A| 
and the theories developed for single-sensor networks, i.e., 
Theorems 4.1 [4. 3 also hold for the multiple sensor cases 
except replacing the quantizers by quantizer vectors. 

A special case is when the sensors are homogeneous, 
i.e., when the observations are independent and identically 
distributed among different sensors. In this case, the maximin 
quantizer vectors are simply replicates of the maximin quan- 
tizers in the corresponding single-sensor case, and such results 
are summarized in the following proposition. 



A" 



Proposition 5.1. Assume that = • • • = f£ = f m for 

777 = 0, 1, . . . , M - 1. Fix a state m, let ft" 1 = J2 3 pL&m 
be the maximin quantizer in the corresponding single sensor 
case where the system has only one sensor and the raw 
data are distributed according to {f m }. Define randomized 
quantizer vector (j>* m — YljPm'f'm w ' m ea ch tfi'^ being a K- 
time replication of , i.e., = (<ffy? , . . . , <jp^>). Then (j>* m 
is a maximin quantizer vector for the state m. 



Proof: The proof follows at once from (25 1 and (26i 



C. Composite Multihypothesis Testing 

Our theory can also be extended to the scenario of compos- 
ite hypothesis with finitely many points. Suppose that there 
are B composite hypotheses, Ho,. . ., Hb_i, where 



of quantizers. Specifically, a (deterministic) quantizer vector 



H,, 



ifc + li ■ 



P'ifc+l-l} 
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include % + i — % points for b = 0, 1, . . . , B — 1, and io = 0. 
Without loss of generality, let us assume M = %b- Then there 
are a total of ig — M simple hypotheses, and the decision 
maker is required to pick up one of the B hypotheses that 
most likely includes the true state of nature P m . Hence, the 
problem formulation is the same as that in Section II, except 
that the cost function W(m, to') needs to be re-defined to 
reflect composite hypotheses in the multihypothesis testing 
problem. To simplify our notation, for m = 0, 1, . . . , M — 1, 
denote by [to] the hypothesis that P TO is in, i.e., [to] = Hf, 
if and only if P m € H&. In composite multihypothesis testing 
problem, the loss function W has the form {W(m, [to'])}, 
where W(m, [m!]) indicates the loss caused by making a 
decision D = [to'] when the states of nature is P m . We assume 
W(m, [to']) > and W(m, [to']) = if and only if to ^ [to'], 
i.e., no loss in making a correct decision. 

As in Section |n] the total expected cost or risk of a test 5 
when the true state of nature is to is: 

K c (6; to) = cE m {N} + ^ W(m, [m'])F m {D = [to']} 



and the Bayes risk of 6 is 



m 



(27) 



where the prior probability of the hypothesis H& is ir ib + . . . + 

In the scenario of composite hypotheses, the definition of 
the two-stage tests is similar except a slight modification of 
the stopping time TV and the final decision D of the fusion 
center in the second stage. For simplicity, let us consider the 
simplest case of the single-sensor and binary sensor messages. 
At time step n in the second stage, the fusion center computes 



m 1 



iW(m',[m]) 



which is the average loss if one makes a final decision D = 
[to]. Then the fusion center stops at time N = min{iVr m |}, 
where 

N[ m ] = {n > N : r [m] . n < c} 

and Nq is the stopping time for the first stage. When stopped, 
the fusion center makes a final decision D = [m] if N = iVr m i . 
Note that we do not change the fusion center policies in the 
first stage, i.e., the preliminary decision Do at the fusion center 
still picks up the most promising state among the M states 
instead of picking up one of the B hypotheses. 

To find the asymptotically optimal tests among the two- 
stage tests, we need to modify the definition of the information 
number /(to; </>) as follows: 



I(m; 4>) 



mm 

n ' $ [m 



/(to, to'; 



0, l,...,M - 1 



that is, when taking the minimum, we shall ignore those states 
grouped into the same hypothesis with to. With these new 
definitions, Theorems 14.11 and 14.21 remain valid, and we can 



still use Theorem 4.3 to numerically compute each maximin 
quantizer 0™ ax by pretending [to] = {P m }, i.e., by temporarily 
discarding other states in [to]. 



VI. Examples 

In this section we illustrate our theory via a numerical 
simulation study. Suppose we are interested in testing the mean 
of a normal distribution with unit variance in a network with 
a single sensor and binary sensor messages. That is, the raw 
data observed at the local sensor follows a normal distribution 
P ~ N(6,l). In the problem of testing three hypotheses 
regarding 0, say, H : = O , Hi : = 6 X and H x : = 2 , 
we assign the prior probability of 1/3 to each of these three 
hypotheses, and as in Draglin et al. (6), we also assume 0-1 
loss for decision-making, i.e., W(m, to') = 1 if to 7^ to' and 
= if to = to'. Two different scenarios will be considered: 

1) Asymmetric (HT1): (0 O , X , 2 ) = (-0.5,0,1). 

2) Symmetric (HT2): (0 o ,0i,0 2 ) = (-0.5,0,0.5). 

For our proposed asymptotic optimal decentralized test Sa 
in these scenarios, it suffices to determine the local quan- 
tizers. The stationary quantizer in the first stage of 5a is 
easy, as we can simply use (fi°(X) = I(X > 0), which 
satisfies the conditions in Subsection IIII-AI It is a little more 
challenging to characterize the maximin quantizers used in 
the second stage of Sa- For the asymmetric case (HT1), it 



is straightforward to show from Lemma 4.1 that the three 



maximin quantizers are all deterministic MLRQs. Numerical 
computations illustrate that the three maximin quantizers are 
0o = I{X > -0.3963), 4>i = I{X > -0.1037), 4> 2 = I{X > 
0.7941) and the corresponding maximin information numbers 
are I = 0.0796, h = 0.0796, 1 2 = 0.3186, respectively. 
The maximin quantizers of the symmetric case (HT2) are a 



little tricky. It is easy to check that Lemma 4.1 can be applied 
to state to = and to = 2, yielding two maximin quantizers 
<t>a = I(X > -0.1037) and cj> 2 = I{X > 0.3963) with 
maximin information numbers Iq = I2 = 0.07959. However, 
we need to pay special attention to the maximin quantizer for 
the state to = 1 since the other two states to = and to = 2 
are symmetric with respect to to = 1. Since the three pdfs 
are obviously linearly independent as defined in Subsection 

1 



|IV-B| by Theorem |4.3| the maximin quantizer for state to 
can be realized as a randomization of at most two ULQs. The 
following lemma, whose proof is straightforward and thus is 
omitted, gives more convenient descriptions of the ULQs in 
(HT2) when the observations are normally distributed. 

Lemma 6.1. For the symmetric case (HT2), up to a permu- 
tation of the values it takes, a ULQ always takes one of the 
following two forms: I(X > A) or I(X± < X < X 2 ), where 
A and Xi < A 2 are real numbers. 

This allows us to do numerical computation of the maximin 
quantizer for state to = 1 as in Subsection IV-B Numerical 



computations turns out to show that the maximin quantizer 
for state to = 1 is also the deterministic quantizer defined by 
<fii — I(X > 0) up to the precision of 5 decimal digits, and 
h = 0.07928. 

For each of two scenarios, (HT1) and (HT2), we will 
consider two versions of our proposed tests: one is Sa(c) 
for the system with a single sensor, and the other is S' A (c) 
for the system with two independent and identical sensors. 
As a comparison of our proposed tests, we also consider an 
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TABLE I: Expected values of time steps taken for each of the 
three tests. 





E m {N) 


<5a 


M<0 


5'4(c) 


Asymmetric (HT1) 


m = 


46.48 


73.5±0.9 


36.8±0.7 


m = 1 


48.39 


77.7±0.9 


38.9±0.7 


m = 2 


11.90 


19.8±0.2 


9.9±0.1 


Symmetric (HT2) 


m = 


46.59 


73.4±0.9 


37.8±0.6 


m = 1 


69.43 


110.2±0.9 


55.2±0.7 


m = 2 


46.60 


73.4±0.9 


37.8±0.6 



asymptotically optimal centralized test 6 a proposed in Draglin 
et al. (3J, ||6j for the system with a single sensor (we omitted 
another family of asymptotically optimal centralized test 5b 
proposed in Draglin et al. 15), |6), since its performance is 
similar to that of 5 a ). For 5 a , the fusion center updates the 
posterior distribution {7r m jl } based on the raw data {X n } and 
its stopping time is defined as N(a) — mini< m <ji^ N m (a), 
where N m (a) = inf{n > 1 : 7r m „ > A m }. In other words, 
5 a stops as soon as one of the posterior probability 7r m n passes 
the threshold A m , which can take different values for different 
m. In the numerical simulation given in |6|, the values of 
these thresholds are as follows. For the asymmetric (HT1), 
A = Ai = 1 - 3.99 x 1(T 3 , A 2 = 1 - 5.33 x 1(T 3 . For 
the symmetric (HT2), A = A x = A 2 = 1 - 3.99 x 1CT 3 . 
These particular values for the thresholds tune the overall 
probabilities of making incorrect decisions with test 5 a to 
1.0 ±0.1 x 10~ 3 . 

In our simulations, the cost c = 3.6 x 10~ 3 , and the 
threshold u(c) at the first stage of our proposed tests 5a (c) and 
S' A (c) is set as 0.1. Because of the selection of the parameters, 
5a, 5' a , and 5 a have similar probabilities of making incorrect 
decisions, i.e., 1.0 ±0.1 x 10~ 3 . Thus it suffices to report the 
simulated expected time steps E m {N} under each of the three 
hypotheses H m for m = 0, 1, 2, as smaller values of E m {7V} 
imply better performance of the test (in the sense of smaller 
Bayes risks). These results are reported in Table [I] 

The numerical results illustrate that the centralized test, 5 a , 
indeed performs better than the decentralized test 5a(c) that 
makes a final decision based on binary sensor messages instead 
of raw normal observations. However, Table [I] demonstrates 
that for (HT1) and (HT2), if we are able to deploy merely 
one extra identical sensor, the decentralized test 5' A (c) has 
smaller Bayes risk than the centralized test with a single 
sensor, not to mention other important benefits such as ro- 
bustness and bandwidth saving capabilities. In other words, 
if designed appropriately, a decentralized test does not lose 
much information as compared to the centralized test, and in 
fact, a decentralized test with two sensors can outperform a 
centralized test with a single sensor. 

VII. Conclusion 

We have developed a family of asymptotically optimal 
decentralized sequential tests when testing M > 3 hypotheses. 
The main idea is to consider "two-stage" tests in which 
one first uses a small portion of total time steps to make 
a preliminary decision of the true state of nature, and then 
the local quantizer switches to the corresponding "maximin 



quantizers." Moreover, we show that each maximin quantizer 
can be realized (or approximated) as a randomization of at 
most M — 1 ULQs, and we also illustrate how to compute 
maximin quantizers numerically. 

There are several theoretical issues in sequential multihy- 
pothesis testing problems that deserve further research. Instead 
of first-order optimality, it will be interesting to investigate 
higher-order asymptotic optimality. It is expected that we 
need to extend our two-stage test 5 A {c) to more than two 
stages in order to achieve higher-order asymptotic optimality. 
In addition, it is interesting to see what happens if the 
sensor observations are no longer i.i.d., especially if they are 
dependent either over time or among different sensors. 

Appendix A 
Proofs of Theorems I4.3I and I5. II 

Since quantizers, especially randomized quantizers, play an 
important role in our theorems, we will gather some useful 
results for quantizers in this appendix, including the proofs of 



Theorems 4.3 and 5.1 Without loss of generality, we assume 
that the quantized messages belong to a finite alphabet, say, 
{0, 1, ... ,1 — 1}. For a (deterministic or randomized) quantizer 
(j> E <f>, define its distribution vector as a vector of Ml 
dimensions: 

l{4>) = (<?(*; 0<i<2-l; 0<m<Af-l 

where q(i;m, 4>) = P m (cf>(X) = i). Now let us consider four 
subspaces induced by the distribution vectors q(<j>) : 

• Let Q be the set formed by the distribution vectors of all 
deterministic quantizers, i.e., Q = {q(4>) : 4> E $}; 

• Let Q = {q{4>) : 4> G $} be the set formed by the 
distribution vectors of all quantizers, deterministic or 
random; 

• Denote by Qu C Q the set of distribution vectors of all 
ULQs (see Definition |5.1[ >; 

• Denote by Q a the set of extreme points of Q. 

By Tsitsiklis p3~| , Q is compact and Q is the compact convex 
hull of Q. By the Krein-Milman theorem, the compact convex 
set Q is also the convex hull of its extreme points. Thus 
it is useful to characterize Q a . Tsitsiklis [23 1 showed that 
Qu C Q a C Q, and Qjj is a dense subset of Q a . Moreover, 
it also studied in detail the case of testing M = 2 hypotheses. 
However, the case of M > 3 hypotheses is more challenging. 
Fortunately, below we are able to show that Q a = Qu 
for Al > 3 hypotheses under some reasonable additional 
assumptions. 

Lemma A.l. If the sensor messages are binary (i.e., I = 2) 
and the pdf's {/q, . . . , /m— l} are linearly independent (as 
defined in Subsection IV-B\, then Q a = Qu. 



Proof: Since Qu is a dense subset of Q a , it is suf- 
ficient to show that if q° € Q a , then q° 6 Qu- Since 
Qu is dense in Q a , there is a sequence of ULQs <f>> , say, 

¥ = I(Y,m<fm(X) > 0) With £ m KJ 2 = 1, SUCh 

that — > q°. By Bolzano- Weierstrass theorem, each 

bounded sequence has a convergent subsequence. By passing 
to subsequences, we can simply assume that a 3 m converges to 
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a°j for each state to, and so J2m( a m) 2 = !• By the condition 
of linear independence, <j)°(X) — I(J2 m a mfm(X) > 0) is 
a ULQ. It remains to show that q° = q(4>°), or equivalently, 
to show that for each state to, \inij^ 00 P m (Aj) — 0, where 
Aj =A]UA?, and 

A) = {X:^</ m (X)<0 and ^a° m f m (X) > 0} 

m m 

and 

A 2 3 ={X :J2<fm(X)>0 and £ <C/ m (X) < 0}. 

m rn 

To prove this, without loss of generality, let us further 
assume that < f m (X) < 1 for any state to, as we can always 
substitute f m (X) by f m {X)/J2 m > fm'(X)- Define another 
sequence of sets {A^} by A< = {X : \E m <fm(X)\ < 
Mej}, where ej = max m — a 3 m \. We claim that Aj C A^ 
for each j. Indeed, if X g A], then J2 m a Lfm(X) < and 

m m 

< Y / \a° m -ai n \=Me j 



where the second inequality uses the assumption that < 
f m (X) < 1. Moreover, if X € A], then £ m a° m f m {X) > 0, 
and thus A] C A'-. Similarly, A^ c A' . So A, C A' . 

J J J.J " J 

Let A = CliliU'jLi ^' 3 - Since a? m converges to a° m for 
each state to, we have Sj = max m — a£j — > 0, and 
thus A = {X : J2 m a °mfm{X) = 0}. Because the pdf's are 
assumed to be linearly independent, P m (A°) = for any state 
to. Hence, lim J _ i . 00 P m (A^) = 0. So limj-nx, ~P m (Aj) = 0, 
and the lemma is proved. ■ 



and 



Now let us consider the K-L divergences for distribution 
vectors of quantizers. Given q G Q, say, q = q(4>), denote 
Qi,m = m i <t>), where i = 0, ... ,1—1 and to = 0, ... , M~ 
1. For < to 7^ to' < M — 1, define the K-L divergence of 
the distribution vector q of state to' from state to by 



l-i 

J (to, to'; q) = ^ q l . rn log — 



(28) 



where as conventional log g = 0. 

On the one hand, the definition of J(m, m' ; q) is standard 
and Tsitsiklis [23| showed that under Assumption [T] for any 
two states to ^ to', the K-L divergence J(to, to'; q) is 
bounded, continuous, and convex as a function of q £ Q. On 
the other hand, for a randomized quantizer <f>, the definition of 
JYto, to'; g(</>)) is equivalent to the K-L divergence defined in 
dSD, not that in d9J. Indeed, J(rn,m';q(4>)) < 7(m, m';0) 
in d9l) and thus it does not directly relate to the maxmin 



information number /(to) in Definition 4.1 Fortunately, the 



idea can be salvaged. To do so, let A4 be the set of Borel 
probability measures on Q, for each p, G JVl and two states 
0<to^to'<M-1 define 



J* (to, to'; /i) = / J(m,m';q)dfi(q) 



(29) 



J* (m; /i) = min J*(m,m';//). 



(30) 



Then for a randomized quantizer </> G $, the K-L divergence 
defined in (|9|l is equivalent to J* (to, to'; /x) for some suitably 
chosen /i. To see this, note that <f> assigns probability masses to 
a finite or countable subset of $, and thus induces a probability 
measure /i(0) on Q. Hence, /(to, to' ; 0) = J*(m, to'; /x(</>)) 
and 

J(m;0) = J*(to;/x(0)). (31) 

Our next result is to provide an alternative representation of 
the maximin information number /(to) defined in Definition 
ED in Subsection HV^Al 

Lemma A.2. The maximin information number /(to) = 
sup^^ J* (to; fi) = sup^g^ J* (to; £i), w/iere .M c 7V( is 
the set of probability measures supported on Q. 

Proof: Denote by and the set of probability 
measures on Q and Q that have at most countable supports, 
respectively. By (31 1, sup^g^o J*(to;/x) — /(to), and thus 



/(to) < sup J*(to;/x) < sup J*(m;/x). 

By Tsitsiklis [23], J(m,m' ; q) is bounded and continuous 
as a function of q £ Q. Hence J* (to, to'; /x) and J*(to;/x) 
are also continuous viewed as functions of /x G .M (under 
weak-convergence). Thus the lemma follows at once from 
the denseness of M° (or M°) in A-l (or VW), provided 
that /(to) > sup Mg ^o J*(to;/x). Hence, it suffices to show 
that for each /x £ there exists a // 6 .M such that 

J* (to, to'; /i) < J* (to, to'; //) for each to' 7^ to. By linearity, 
we only need to prove it under the further assumption that 
/i G .M is supported on a single point g = q((j>) for a 
randomized quantizer <f> e $. In this case J*(m, m!;p) = 
J(to, to'; g) < /(to, to'; 0). By our previous argument, can 
be identified to a probability measure // = /x(</>) € A'l 
with the property I{m,m'\4>) — J* (to, to'; /i'). Therefore 
J* (to, to'; /i) < J* (to, to'; /i'), completing the proof of the 
lemma. ■ 



Finally, we are in a position to prove Theorems 4.3 and 5.1 



Proofs of Theorem 4.3 and Theorem 5.1 ■ Note that 



Theorem 4.3 is a special case of Theorem 5.1 and follows at 



once from Theorem 5.1 and Lemma A.l under the assumption 



of binary sensor messages and linearly independent pdf's in 
which Qu = Q a . By symmetry and the fact that Qjj is a 
dense subset in Q a , it is sufficient to show that under the 
assumption of Theorem |5.1| for the state to = 0, exists one 
maximin quantizer which is a randomization of at most M — 1 
quantizers with their distribution vectors in Q a . 

Define two sets in M — 1 dimensional space, J — 
{(J(0,1; ?),..., J(0,M - l;q)) : q € Q}, and J a = 
{( J(0, 1; q), . . . , J(0, M ~ 1; q)) : q G Define the same 

for .J?* and J'* when J(0,m;q) is replaced by J*(0,to;^) 
with fi E M and /i G A^ Q , respectively, where is the 
set of probability measures supported in Q a . As we have 
mentioned earlier, J(0, to; g) is continuous if viewed as a 
function of q G Q, so both J? and ^ Q are compact. Obviously, 
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tf* and Jf* are convex hulls of £ and tf a , so they are 
compact as well. The main idea of the proof is to relate the 
maximin information number 1(0) with the set y*. 

First, we claim that 1(0) = sup Je h( J), where h(-) 
is a function on M — 1 dimensional space defined by 
h(xi, . . . , Xm-i) = rnin{xi, . . . , xm-i}- By Lemma A. 2 we 
have 1(0) = sup Je y, h(J). Since J?* C <&*, to prove the 
claim, we only need to show, for any J £ J>* , there exists 
J' G , such that each component of J is less or equal to the 
corresponding component of J'. By linearity, it is sufficient to 
prove for Je/, say, J = ( J(0, 1; £?),... , J(0, A/ - 1; ?)) 
for some q e Q. Decompose q as a convex combination of 
points in Q a : q = '^p'q 3 , then 

J(m, to'; q) < V^p 7 J(m, to'; < to ^ to' < M — 1. 

Let J' = (J*(0, l;/i), . . . , J*(0, M - 1;/a)) with /i assigns 
probability mass %P to g J for each j, and our claim is justified. 
Second, we will show that 

sup h(J) = min J*(0,to;/io) (32) 

j^zjz"* l<m<M— 1 

for a probability //o € whose support includes at most 
M — 1 points. To see this, note that J?* is a compact convex 
subset in M — 1 dimensional space. Thus /i(-) attains its 
maximum at a point J on the surface of Jf* and J can 
be realized as a convex combination of at most M — 1 
points in J^, see, for example, Hormander [9|. Suppose 

that J = YlfJx 1 Po JJ where EPo = 1 and J3 e 
For each j, let = ( J(0, 1; gj), . . . , J(0, M - 1; ^)), with 
<?o G Qa- Define /xo € .M Q be a probability measure such that 
^o(<Zo) = Pec f° r J — 1j . . . , M — 1, then (32 1 holds. 

Finally, define the randomized quantizer ifio as the 
one induced by the measure /xq in (32 1. Then 1(0) = 



min m ^o{^(0, m ; 0o)} an( J <^o can be rewritten as Ejli 1 Po^o 
where 0g has q^ as its distribution vector. Equivalently, 4>q 
is just the maximin quantizer 0g ax , and it can be taken as 
a randomization of at most M — 1 quantizers with their 
distribution vectors in Q a . This completes our proof. ■ 

Appendix B 
Proof of TheoremI4.1I 



At each stage of our proposed two-stage test 5(c), since 
the local sensor uses stationary (though possibly randomized) 
quantizers, the sensor messages C/ n 's are i.i.d. and the fusion 
center essentially faces the classical centralized sequential 
hypothesis testing problems. Thus Theorem |4.1| can be proved 
by standard arguments and by conditioning on the preliminary 
decision D of the two-stage test 5(c). In the following we will 



focus on the proof of ( 16 1 to highlight the associated technical 



mathematical problems that need special attention. Denote by 
No and Ni the total time steps of the first and second stages 
of the two-stage test 5(c), respectively, then the total time step 
N taken by 5(c) satisfies 



E m {iV} = E m {iV }+E m {iV!} 

= E m {iV } + E rn {iVi| J D = 
+E m {N 1 l{D Q ^m}}. 



to}P to {D = to} 



By standard arguments for the classical centralized sequential 
multihypothesis testing problems, the stopping boundary of 
1 — u(c) at the first stage guarantees that P m {D = to} 
1 - 0(u(c)) and E m {iV } = 0(\ log«(c)|). Since u(c) -> 
satisfies | logit(c)|/| logc| — > 0, e.g., u(c) = 1/| log c|, we 
have P m {D = m} = 1 - o(l) and E m {iV } = o(\ logc|). 
Hence, equation ( [To} holds if we can further show that 

E ro {iVi|A) = to} = (1 + o(l))| logc|/7(m; $ m ) (33) 
and 



E m {W 1 l{£> ^m}} = o(|logc[ 



(34) 



To prove ( [33] ) and ( [34"} , note that at time n of the second 
stage of our proposed two-stage test 5(c), the log-likelihood 
ratio statistic of the latest sensor message at the fusion center 
is 



AZ n (m, to' 



log 



f m (Un\^ (n) ) 



where is the deterministic quantizer selected through 

the randomization at time step n and U n — <p^ n \X n ) is the 
quantized sensor message. Hence, for our proposed two-stage 
test, the log-likelihood ratio statistic of all available sensor 
messages up to time n is 



Z n (m,m'-J) = VAZ,(m,m^(''). 



i=l 



(35) 



Furthermore, since <f> is assumed to be a randomization of a fi- 
nite number of deterministic quantizers, our implementation of 
randomized quantizers implies that { AZ n (m, to'; (f^^'), n 
1.2.... > is a sequence of i.i.d. random variables with mean 
I(m, ml; 4>) in (Ml and finite variance. 

To prove ([33|, it is sufficient to show that 



E m {N^Dq = m,ir., No } = (1 + o(l))| \ogc\/I(m; (j) m ) 

where 71".,/^ = (t^o,n , ■ ■ ■ ,^m-i.n ) denotes the posterior 
distributions at time A^o and the o(l) term is uniform on the 
event {D Q = to} for any possible n. at - This relation itself 
follows at once from the fact that Z„(to,to';0)} is the sum 
of i.i.d. random variables with mean I(m,m')(f>) in ^ and 
finite variance, but we need some extra work to prove the 
uniformness of the o(l) term. For that purpose, given the state 
to, let B c = I logc/(l — c)| + I log(l — u(c))\ and consider 
the following stopping time: 

T(B C ; <j> m ) = mi{n : min Z n (m, to'; (j> m ) > B c } (36) 



where Z n (m, to'; <j) m ) is the log-likelihood ratio in (35 1 except 
that the quantizer <fi is now replaced by cf> m since we condition 
on Do = rn. Clearly, under the conditional distribution 
Pm{'\Do = m,TT. ! N }> m e stopping time N\ is dominated by 
T(B C ; cf> m ), which does not depend on ir.,N - By the law of 
large numbers, we have E m {T(B c ; (f> m )}/B c — > 1/I(m; <j> m ), 
also see Theorem 5.1 of Baum and Veeravalli JTJ. Thus we 
can have a o(l) term with the < part of relation ( |33| ) due to 
the above arguments and the fact that log it (c) = o(|logc|). 



12 



The > part of the relation can be proved similarly and thus 
relation ( |33] l holds. 

The proof of ( |34| involves more technical details. It suffices 
to show that E m {iV 1 l{£)o = to'}} = o(| logc|) for each to' ^ 
to. Now when {Do = to'}, our proposed two-stage procedure 
6(c) uses the stationary (likely randomized) quantizer (j> m i at 
the second stage. Hence, we can define Z„ (to, to'; </>„,/) as 
in ( 35 1 except that we now use the stationary quantizer <f> m i . 
Likewise, define T(B*; <j> m t) as in Q with B* = | logc/(l- 
c)| + | log 7r. m | , where 7r m = ^ m ,N m me posterior probability 
of the mth hypothesis at time TV . Then 

E m {N 1 l{D = m'}} 

< E m {T{B* c ;4> m ,)l{D = m'}} 

= E m {(1 + o(l))B*l{D = m'}/I(m; 4> m ,)} 

< (l + o(l))/J(m;^)x 

E m {(| logc/(l - c)| + | log7r ro |)l{A) = m'}} 
= O(\logc\)P m {D =m'} + 

0(l)E ro {|log7r ro |l{-Do=m'}} 
= o(|logc|) + O(l)E m {|log7f m |l{ J D = m'}}. 



Thus, to prove p4) , it remains to show that 
E m {| log7r m |l{D = to'}} = o(|logc|) with n m = ■K m ,N Q - 
Below we will prove a stronger statement that 

E m {| log7r miA r |l{L> ^ m }} = (!)- 

By assumption, at time N , if D = to' then ir m > jy > 
1 - u(c) > 1/2. So 7T m ,jv < u(c) < 1/2 and for all L > 0, 



P m {| logx 



m,N 



> L, D Q ^ m} 



< 



< 



P, 
P„ 



log 1 Wm ' No >£-log2,D ^m 

, 1 - 7r„ 

sup log 



> £ - log 2 



< P„ 



sup — ^- exp{— Z n (m, to'; 0°)} 

n>l , , , Tra 



> e L /2 



< P m < min inf Z n (m, m ; (jr) 

m' im'^m n > 1 



< -L + lo; 



2(M - 1) 



Assume for a moment that the minimum Z* = 
min m / :m /^ m inf T j> Z n (m, to'; </>°) is exponentially bounded 
in the sense that there exists a constant Ci > and < p < 1 
such that for any L > 0, 



P m {Z* < -L} <C 1( o L . 

Then we have 

Pm {| log n m . No I > L, D ^ to} < C 2 p L 



(37) 



with the constant C 2 = C\ exp(— log p log 2 ^ M 1 ~ > ). Conse- 
quently, 

E m { I log 7r miA r 1 1{L» # m}} 
= E m {| log7r miA r |l{L» ^ to, I log7r mjA r | > | logit(c)|}} 



< C 2 



I log'. 

logpl 



p L dL 

[o)\ 
I logti(c)| 



0. Thus (34 1 is proved and the theorem 



which goes to as c 
holds. 

It remains to prove ( |3~7|. Since the log-likelihood ratio 
statistic Z n (m,m' ; <fi) in (35i is the sum of i.i.d. random 



variables with positive mean and finite variance under P m , 
the minimum 

Z* m , = inf Z n (m, to'; 4>) 

n>0 

is a well-defined (non-positive valued) random variable under 
P m . Moreover, 



Pm {Z* < —L} < 



E 



P m {Z* l ,<-L}. 



Thus, to prove (37 1, it suffices to show that Z*, is ex- 



ponentially bounded for each to'. Define a stopping time 
r_ = inf{n : Z n (m, to'; 0) < 0} and let Yi,Y 2 ,. . . be i.i.d. 
random variables, where Y% — Z T (to, to'; (f>) conditional on 
the event r_ < 00. Then it is well-known that Z^, has the 
same distribution as Yli=i ^i, where TV is a geometric random 
variable independent of such that P(N = n) = p(\—p) n 
with p = V m {Z* m , = 0} > 0, see Klass (TT), or Lemma 
11.3 and Remark 11.3 of Gut |8). Now in our case, since 
U n is discrete and <fi is randomization of a finite number of 
deterministic quantizer, AZ n (m, m';(f>) has a lower bound, 
say — C for some C > 0. Thus Y\ = Z T _ (to, to'; (f>) also has 
a lower bound — C. So 

N 

P m {Z* m ,<-L} = P(J2Yi<-L) 

i=l 

= P{N > L/C) 
= (l-p)^ 

where the last relation uses the fact that N is geometrically 
distributed. Hence Z* n , is exponentially bounded and the 
theorem holds. It is also instructive to compare Z* m , with 
Brownian motion. Let B(t) denote standard Brownian motion 
with mean zero and variance parameter 1. Then for all positive 
L, p and a, 

P(inf{crB(<) + pt} < —L) = exp(-2p<T~ 2 L). 

Appendix C 
Proof of TheoremI4.2I 



To prove Theorem |4.2| the main idea is to construct a 
martingale based on log-likelihood ratios and then apply 
the optional stopping theorem and Wald's inequalities. Since 



Theorem 4.2 deals with general decentralized sequential tests 
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that may or may not implement randomized quantizers as we 
proposed for the two-stage tests, denote by <fi n the quantizer 
used at time step n to the best knowledge of the fusion center. 
For example, when a randomized quantizer <j> = J^P''^ is 
implemented and the fusion center knows that the determin- 
istic quantizer <$> is picked at time step n, then <j> n — (fP . 
Meanwhile, if the randomization is done at the local sensor 
and the fusion center has no access about which deterministic 
quantizer is picked, then <fi n = <f>. 

Let U n be the sensor message at time step n and let 
q{4> n ) be the distribution vector of <fi n . For n ~ 1,2,..., 
define T n -\ as the er-algebra generated by {Ji, . . . , U n -x 
and q(cf>i), . . . ,q((p n ). In other words, J-'n-i is all the past 
information available to the fusion center before the nth time 
step. Then at time step n, the log-likelihood ratio of state m 
with respect to state m! is Z n = J2i=i A-Zj, where 



is nonnegative and that (1 — a)log(l — a) + a log a attains 
minimum value — log 2 when a = | . By ( 38 1, we have 



AZ t = log 



f ml {Ui\Fi-i) 



and fmi'l^i-i) is the conditional probability mass function 
induced on Ui under P m . Since U depends on Ti-\ only 
through <j> u / m (-|J"i_i) is simply f m (■;</>) in <J8|>, and thus 



E m {AZ i |J' l _i} = J(m,m';q((f>i)) in ((28J. Therefore, 

n n 

M n = ^2 \^ z i ~ J ( m ; m ';<7(<^)) = Z n-/^ J J(m,m';q 



(&)) 



forms a martingale under P m with respect to {T n }. Applying 
the optional stopping theorem to the martingale {M„; J 7 ™}, 
for the stopping time N of a decentralized test 5(c), we have 
~E m (M N ) = 0, or equivalently, 



{N 
^ J(m,m';q((f>i)) 
i=i 



E '» jll J(m,m';q(fa))\ > | logc| - log | logc| + 0(1). 

(39) 

Now we claim that the left-hand side of ([39} can be rewritten 
as J* (m, m'; /i m )E m {N} for a suitably chosen probability 
measure \i m on Q, where J* (m, m'; /i m ) is defined as in (29 1. 
Then the theorem follows at once from this claim, relation 
p9| ), and Lemma |A.2| It remains to prove this claim. To 
do so, define u m as a convex combination of a sequence of 
probability measures {/x m „ j : i < n} as follows. 

_ ^ A V m {N = n} 
^-ZjZj Em {iV} Mm '"' 1 - 

Then let fi m ,n,i be the distribution of under P m and 

conditioned on the event TV = n. In other words, for any 
Borel set A C Q, ^,„,i(A) = P m {?(^i) € A|7V = n}. We 
have 



E m {N} J*(m,m!;n m ) 

n=l i=l m t J 

oo n 

J(m,m / ;g(0i))| = n| 

n— 1 2=1 

oo n 

J2 J2 E ™ { J ( m > m '' = n}} 

71—1 l — l 

m,m / ;q(4i l )) > . 



(38) 



Now let us go back to the proof of Theorem 4.2 Obviously, 
for a decentralized test 6(c) satisfying the error probability 
assumption in Theorem |4.2| if the sample size N satisfies 
E m {N} — oo, then Theorem [4^2] holds. Thus we only need to 
consider the case when E m (N) < oo. To derive the asymptotic 
lower bound on E m (N), we construct a new test S'(c) that 
accepts H m if the final decision of 8(c) is D = m but accepts 
H m / (for a given m! ^ m) if D ^ m. Then this new test 
S'(c) is a well-defined sequential test in the problem of testing 
a simple hypothesis H m against a simple alternative H m /. 



Moreover, the assumption of Theorem 4.2 guarantees that both 
type I and type II errors of S'(c) are less than a c — Ac\ logc|, 
where A > is a constant. Hence, Zn represents the log- 
likelihood ratio of the test S'(c) when stopped and by Wald's 
inequalities (also see Theorem 2.39 of Siegmund [24]), 



1 — f~V cv 

E m {Z N } > (l-a c )log( ^) + ac log(-^) 

etc 1 - a c 

> (l-a c )|loga c | -log 2 
= |logc|-log|logc|+0(l) 

as c — > 0, where the 0(1) term depends only on A. Here the 
second inequality follows from the facts that alog(l — a)^ 1 



. i=l 
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